2 research outputs found
Unifying Foundation Models with Quadrotor Control for Visual Tracking Beyond Object Categories
Visual control enables quadrotors to adaptively navigate using real-time
sensory data, bridging perception with action. Yet, challenges persist,
including generalization across scenarios, maintaining reliability, and
ensuring real-time responsiveness. This paper introduces a perception framework
grounded in foundation models for universal object detection and tracking,
moving beyond specific training categories. Integral to our approach is a
multi-layered tracker integrated with the foundation detector, ensuring
continuous target visibility, even when faced with motion blur, abrupt light
shifts, and occlusions. Complementing this, we introduce a model-free
controller tailored for resilient quadrotor visual tracking. Our system
operates efficiently on limited hardware, relying solely on an onboard camera
and an inertial measurement unit. Through extensive validation in diverse
challenging indoor and outdoor environments, we demonstrate our system's
effectiveness and adaptability. In conclusion, our research represents a step
forward in quadrotor visual tracking, moving from task-specific methods to more
versatile and adaptable operations